Neurocontrol by Reinforcement Learning
نویسنده
چکیده
Reinforcement learning (RL) is a model-free tuning and adaptation method for control of dynamic systems. Contrary to supervised learning, based usually on gradient descent techniques, RL does not require any model or sensitivity function of the process. Hence, RL can be applied to systems that are poorly understood, uncertain, nonlinear or for other reasons untractable with conventional methods. In reinforcement learning, the overall controller performance is evaluated by a scalar measure, called reinforcement. Depending on the type of the control task, reinforcement may represent an evaluation of the most recent control action or, more often, of an entire sequence of past control moves. In the latter case, the RL system learns how to predict the outcome of each individual control action. This prediction is then used to adjust the parameters of the controller. The mathematical background of RL is closely related to optimal control and dynamic programming. This paper gives a comprehensive overview of the RL methods and presents an application to the attitude control of a satellite. Some well known applications from the literature are reviewed as well.
منابع مشابه
The Importance of Clipping in Neurocontrol by Direct Gradient Descent on the Cost-to-Go Function and in Adaptive Dynamic Programming
In adaptive dynamic programming, neurocontrol and reinforcement learning, the objective is for an agent to learn to choose actions so as to minimise a total cost function. In this paper we show that when discretized time is used to model the motion of the agent, it can be very important to do “clipping” on the motion of the agent in the final time step of the trajectory. By clipping we mean tha...
متن کاملPii: S0378-4754(99)00117-2
A novel approach to adaptive direct neurocontrol is discussed in this paper. The objective is to construct an adaptive control scheme for unknown time-dependent nonlinear plants without using a model of the plant. The proposed approach is neural-network based and combines the self-tuning principle with reinforcement learning. The control scheme consists of a controller, a utility estimator, an ...
متن کاملAdaptive Critic Designs - Neural Networks, IEEE Transactions on
We discuss a variety of adaptive critic designs (ACD’s) for neurocontrol. These are suitable for learning in noisy, nonlinear, and nonstationary environments. They have common roots as generalizations of dynamic programming for neural reinforcement learning approaches. Our discussion of these origins leads to an explanation of three design families: Heuristic dynamic programming (HDP), dual heu...
متن کاملDynamic Obstacle Avoidance by Distributed Algorithm based on Reinforcement Learning (RESEARCH NOTE)
In this paper we focus on the application of reinforcement learning to obstacle avoidance in dynamic Environments in wireless sensor networks. A distributed algorithm based on reinforcement learning is developed for sensor networks to guide mobile robot through the dynamic obstacles. The sensor network models the danger of the area under coverage as obstacles, and has the property of adoption o...
متن کاملLearning Spatio-Temporal Planning from a Dynamic Programming Teacher: Feed-Forward Neurocontrol for Moving Obstacle Avoidance
Within a simple test-bed, application of feed-forward neurocontrol for short-term planning of robot trajectories in a dynamic environment is studied. The action network is embedded in a sensorymotoric system architecture that contains a separate world model. It is continuously fed with short-term predicted spatio-temporal obstacle trajectories, and receives robot state feedback. The action net ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996